CT radiomics for prediction of microvascular invasion in hepatocellular carcinoma: A systematic review and meta-analysis

Highlights • CT radiomics could preoperatively predict MVI in HCC with an AUC of 0.87.• Radiomics model based on 3D tumor segmentation, and deep learning model can be superior to predict MVI.• Reproducibility of current radiomics models for clinical application may be uncertain.


Introduction
Hepatocellular Carcinoma (HCC) is the sixth most prevalent malignancy globally [1]. Although extensive efforts have been made in the surveillance and treatment of HCC, 5 years of recurrence after hepatic surgery still remains a major challenge [2]. Microvascular Invasion (MVI) has been considered an independent predictor of postoperative recurrence and poor prognosis after surgical hepatic resection. For the HCC patients with MVI, more aggressive treatment strategies, such as a wide resection margin, and preoperative neoadjuvant therapy, should be performed to improve survival through eradicating micro-metastases [3]. Hence, an assessment of MVI status before surgery is of great clinical relevance in HCC treatment decision-making. However, MVI is a histologic diagnosis based on postoperative microscopic examination of surgical specimens [4]. Preoperative prediction of MVI is still challenging [5]. Exploring new methods to preoperatively evaluate MVI status in HCC is of great importance.
As an emerging approach that can mine the hidden information in medical images to extract high-throughput imaging features and convert them into mineable data for quantitative analysis, radiomics has been also used to predict MVI in HCC and has shown the potential value for MVI prediction [6]. A number of radiomics models based on Computed Tomography (CT) data for MVI prediction have been constructed. However, as the methodologic variability in current CT radiomics research, such as the differences in imaging phase, model construction, sample size and so on, the diagnostic power of CT radiomics for preoperative evaluation of MVI remains variable in the reported studies [7−17]. Hence, the authors searched relevant studies and performed this systematic review and meta-analysis to evaluate the value of CT radiomics for the MVI prediction in HCC, and to investigate the methodologic quality in the workflow of the radiomics research.

Materials and methods
The study was registered prospectively in the International Prospective Register of Systematic Reviews (No.: CRD42022333822) and complied with the guidance of the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA). Ethical approval and informed consent were waived because the present study did not collect patient information nor influence patient care.

Literature research and study selection
All published relevant studies in English from the databases of PubMed, Embase, Web of Science, and Cochrane Library were systematically searched up to May 31, 2022. The search was performed according to the following terms: ((radiomics) OR (artificial Intelligence) OR (deep learning) OR (machine learning)) AND ((CT) OR (computed tomography)) AND ((hepatocellular carcinoma) OR (hepatoma) OR (hepatic tumor) OR (HCC) OR (liver cancer)) AND ((microvascular invasion) OR (MVI) OR (vascular invasion)). Reference lists of the included studies were also searched manually to recruit any potentially eligible studies.
After the removal of the publications in the form of letters, conference abstracts, editorials, reviews, case reports and duplicates, the studies which met the following criteria were included: 1) Patient population consisted of HCC patients with MVI confirmed by pathology after surgical resection or liver transplantation; 2) Radiomics based on CT images was performed for preoperative MVI prediction; and 3) The main result or one of the main results was the diagnostic accuracy of CT radiomics for predicting MVI in HCC. The authors excluded studies according to the following criteria: 1) Preoperative reception of antitumor therapy, such as systemic chemotherapy, transarterial chemoembolization, and radiofrequency ablation; 2) A two-by-two table could not be constructed from the data; 3) An animal experiment; or 4) The sample size of the study is less than 30.
All identified articles were first screened by title and abstract, and then full-text reviews of potentially eligible articles were performed independently by two authors (the first and the second authors with 12 and 3 years of radiological experience, respectively). Any disagreement was resolved by discussion to reach a consensus. Reference lists of the included studies were also searched manually to recruit any potentially eligible studies.

Data extraction
The following information was extracted from each paper by two authors in consensus (the first and the third authors with 12 and 23 years of radiological experience, respectively): 1) Study characteristics including authors, year of publication, study type, study design, and study country; 2) Subject characteristics including the total number of participants, the MVI-positive and MVI-negative cases, sensitivity, specificity and Area Under the receiver operator Characteristic Curve (AUC). The number of True Positives (TPs), False Positives (FPs), False Negatives (FNs), and True Negatives (TNs) were calculated by using the above-mentioned information in each included study; 3) Radiomics model characteristics including imaging phase, region segmentation, feature selection, clinical features, radiological features, modeling method, and validation method. In studies the data were split into training and validation cohorts, only validation data of type 2a or above according to Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [18] were extracted for meta-analysis to avoid the potential bias from training processes of radiomics models in the training cohorts. If there were two or more radiomics models based on the same group of patients in one study, the model with the best diagnostic performance was included in the present meta-analysis.
Assessment of radiomics quality score and study quality The previous two reviewers (the first and the third authors) assessed the methodologic quality of the included literature in consensus by a scoring system proposed by Lambin in 2017 − the Radiomics Quality Score (RQS), according to 6 domains with 16 items [19]. Domain 1 assesses the quality and reproducibility of image and segmentation; domain 2, the reporting of feature reduction and validation; domain 3, biological validation and clinical utility; domain 4, model performance; domains 5 and 6, demonstration of high level of evidence and open science, respectively. The ideal score of the RQS is 36 points, corresponding to a percentage of 100%. The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was also used to evaluate the risk of bias and concern of application in the four domains including patient selection, index test, reference standard, and flow and timing [20]. The results of each domain were categorized as yes, no or unclear for the risk of bias, and low risk, high risk, or unclear for applicability concerns.

Statistical analysis
The pooled sensitivity, specificity, Positive Likelihood Ratio (PLR), and Negative Likelihood Ratio (NLR) were calculated. Then a Summary Receiver Operating Characteristic (SROC) curve was drawn, and the Area Under the SROC Curve (AUC) was used to evaluate the diagnostic power of the included studies on MVI prediction. An AUC of more than 0.9 indicated a high diagnostic value, while values between 0.7 to 0.9 and less than 0.7 indicated moderate and low diagnostic value, respectively.
Forest plots were drawn, and I 2 was considered to detect the heterogeneity among the included studies. I 2 > 50% was regarded as substantial heterogeneity. To investigate the potential sources of heterogeneity, meta-regression and subgroup analysis of several relevant covariates were performed according to the imaging phase, region segmentation (3D or 2D), algorithm for feature extraction and selection (deep learning or non-deep learning), combined clinical features or radiological features (yes or no), and modeling method (deep learning or non-deep learning). Additionally, Deeks' funnel plot and Deeks' asymmetry test were performed to assess the publication bias. All statistical analyses were carried out with Meta-DiSc version 1.4 and STATA version 16.0 (StataCorp LP, College Station, TX, USA).

Literature selection and general characteristics of the included studies
The study selection procedure is depicted systematically in Fig. 1. In total, 11 studies published between April 2018 and May 2022, with 63.6% (7/11) within the three years (2020−2022) [7−17], were eligible for this systematic review and meta-analysis. 3298 HCC patients with 1344 (40.8%) MVI-positive and 1954 (59.2%) MVI-negative were studied. More details about the general characteristics of the included studies are shown in Table 1.

Radiomics model characteristics
The radiomics model characteristics are summarized as follows according to the typical workflow of radiomics research (Table 1).

Region segmentation
Among the 11 enrolled studies, 3D and 2D tumor segmentation was performed in 6 and 5 studies, respectively. The six studies performed by 3D tumor segmentation included manual segmentation in four studies (4/6) and semiautomatic segmentation in two (2/6). The 2D segmentation was performed on the axial slice with the largest tumor diameter in the remaining five studies (5/11). The previous 2D segmentation was drawn manually in three studies (3/5), semi-automatically in one study (1/5), and automatically in one study (1/5).

Feature extraction and selection
In the included studies, the most commonly used algorithm for feature extraction and selection was Least Absolute Shrinkage and Selection Operator (LASSO) regression (5/11), followed by Convolutional Neural Network (CNN) (2/11) and support vector machine (2/11).

Modeling
Eight studies (8/11) constructed a non-deep learning model, in most of which (5/8) logistic regression was performed; and the remaining three (3/11) studies constructed a deep learning model with CNN or 3D CNN. The clinical risk factors and/or radiological features were used to construct a combined prediction model in 9/ 11 studies, among which five studies included clinical risk factors, one study included radiological features, and three studies included both of them. The commonly used clinical and radiological features included Hepatitis B Surface Antigen (HBsAg) or Hepatitis C Virus Antibody (HCVAb) status, Alpha-Fetoprotein (AFP), Child-Pugh score, Aspartate Aminotransferase (AST), tumor size, non-smooth tumor margin, ill-defined pseudo-capsule, peritumoral arterial enhancement, and portal vein tumor thrombosis.

Validation method
In the enrolled 11 studies, the research subjects were randomly divided into a training cohort and an internal validation cohort at a certain ratio in seven studies, and into the training cohort and the internal and external validation cohorts in two studies. And the remaining two studies had no validation.
The items of image protocol, multiple segmentation, feature reduction, cut-off analyses, comparison with the gold standard, and discrimination statistics were performed in all the studies. In nine of the 11 studies, a validation test was performed, but only two of the nine studies applied an external validation and assigned 3 points. The remaining two studies (2/11) had no validation and were assigned −5 points. Due to the lack of prospective studies, deficiency of phantom studies on all scanners, absence of imaging at multiple time points, insufficiency of biological correlated discussion, shortness of cost-effectiveness analysis, and unavailable open science and data, all the 11 included studies obtained the point of zero in these items.
The results of the risk of bias and the applicability concerns assessed by the QUADAS-2 tool are shown in Fig. 2. A majority of studies showed a low or unclear risk of bias in each domain.

Publication bias
Deeks' funnel plot (Fig. 3) showed that the slope coefficients were relatively symmetrical (p > 0.05), suggesting that the publication bias of the included studies was not present.

Meta-regression and subgroup analyses
As substantial heterogeneity among the included studies was suggested by the I 2 values of sensitivity, specificity, PLR, and NLR (all I 2 > 50%), the meta-regression analysis was performed. The results showed that region segmentation (3D or 2D), and modeling method (deep learning or non-deep learning) contributed to the study heterogeneity (p= 0.017 and 0.002, respectively).
The results of subgroup analyses are shown in Table 2. In terms of region segmentation, the pooled sensitivity, specificity, and AUC were  higher in studies with 3D region segmentation than with 2D. The predictive model constructed by deep learning showed a higher diagnostic power than that by non-deep learning.

Discussion
The present study showed CT radiomics could be an efficient method to preoperatively predict MVI in HCC, with an AUC of 0.87. Radiomics models based on 3D region segmentation and deep learning achieved superior performances compared to 2D segmentation and non-deep learning, respectively. However, the methodologic quality of the included literature was insufficient.
Because of its high availability and low cost, CT is widely used for HCC examination. Although conventional CT features represent relatively few metrics for MVI prediction in HCC [21], CT radiomics could transform raw images into numerable quantitative features, and interpret tumor instinct pathophysiology. Thus, CT radiomics provides more possibility for MVI prediction [19,22]. However, due to the variability of imaging phases performed in the included studies, the numbers of studies with each imaging phase were relatively small in the present meta-analysis, and subgroup analysis could not be performed to evaluate the prediction power of CT radiomics models based on each phase, and the best phase for MVI prediction in radiomics research could not be recommended.
Given that the radiomics workflow involves multiple steps and that each step can be performed by several different strategies and approaches [6,19], the heterogeneity among the included radiomics studies was high in the present meta-analysis. Meta-regression analysis demonstrated that region segmentation (3D or 2D), and modeling method (deep learning or non-deep learning) contributed to the study heterogeneity in CT radiomics. The radiomics model based on 3D tumor segmentation achieved a superior performance for MVI prediction compared to 2D segmentation. The probable reason is that the volumes of interest derived from 3D tumor segmentation can provide the entire volumetric imaging features of the tumor and might be less influenced by hand-related artifacts. Meanwhile, the deep learning model was demonstrated to have a higher prediction power than the non-deep learning model. As a promising technique to learn features associated with a predefined task, deep learning has an advantage in learning features from the raw images without precise annotations, and in the learning process, feature extraction is not required, which avoids defects in human-designed features in radiomics analysis. Compared with the non-deep learning methods used in radiomics analysis, it needs less manpower and time for MVI prediction, and it is proven to be more powerful in various  challenging clinical tasks [23,24]. Hence, it has been expected to improve the efficiency and reliability of constructed models.
Despite the promising results of the present meta-analysis, the overall methodologic quality of the included literature was insufficient, reducing the reliability and repeatability of the radiomics models for clinical implementation. The lack of prospective studies, deficiency of phantom studies on all scanners, absence of imaging at multiple time points, insufficiency of biological correlates discussion, shortness of cost-effectiveness analysis, and unavailable open science and data attributed to the low RQS scores. Moreover, although internal validation was performed in most studies, independent external validation was lacking.
In the future, RQS should not only be used to assess the methodologic quality of radiomics research but also to guide the radiomics study design and should be used even as a routine self-checklist before manuscript submission.
This systematic review and meta-analysis have several limitations. Firstly, all included studies were designed retrospectively, which may cause a patient selection bias. Secondly, due to the numbers of studies with each CT imaging phase being relatively small, the best CT phase for MVI prediction in radiomics research could not be recommended. Thirdly, the authors did not perform the study on the validation of CT radiomics for the prediction of MVI in HCC to assess prognosis and treatment because the aims of this systematic review and meta-analysis were to evaluate the value of CT radiomics for the MVI prediction in HCC, and to investigate the methodologic quality in the workflow of the radiomics research.
In conclusion, the systemic review and meta-analysis demonstrate that CT radiomics could be an efficient method for preoperative MVI prediction in HCC. The radiomics model based on 3D tumor segmentation and deep learning could achieve superior performances compared to 2D segmentation and non-deep learning, respectively. However, the heterogeneity of the included studies precludes a definition of the role of CT radiomics in predicting MVI. It is necessary to design prospective studies with an external validation cohort in accordance with a standardized radiomics workflow and RQS items in the future to enhance the reliability and reproducibility of the radiomics models for clinical application.